Enable HTTP/2 support in kube-client#1823
Conversation
719aab8 to
0b5cdf2
Compare
|
wow, really?! i wasn't aware that the apiserver supported http2 for this, but sure enough, there are flags for it. I do see it gives us some |
I'll take a look over the regressions, but it won't be immediately. |
|
Appreciated. No rush. From a quick googling around, this comment in client-go possibly implies some mutual exclusivity in usage, and perhaps we need to set Request::version_mut to override back to 1.1 for websocket stuff in kube-core's |
|
hey there. do you think you'll have time to come back at this at some point? i can open an issue for someone else to tackle if not. |
I’d like to say “yes”, but realistically it may be 1 to 2 months before I can get back to this. For context: this work matters to me because I’ve been building a tool called kat (tailing Kubernetes pod logs). Its default behaviour is to tail all containers across all pods and namespaces, and without connection reuse (HTTP/2) it’s easy to run into file-descriptor limits on some clusters I connect to. So I definitely care about getting HTTP/2 right here - I’m just not sure when I’ll be able to actively work on it again. If someone else is able to pick this up in the meantime, that would be hugely appreciated. If not, I may be able to return to it towards late March / April. |
|
Thanks for the response. I have opened #1923 for now to let people experiment if the interest in there. |
|
I hope to start looking at this again in the next few weeks. 🤞 |
Add http2 feature to hyper, hyper-util, and hyper-rustls to enable HTTP/2 ALPN support in the kube-client. This complements the enable_http2() API call to provide full HTTP/2 multiplexing capability for Kubernetes API connections. Signed-off-by: Andrew McDermott <aim@frobware.com>
This change adds HTTP/2 ALPN support to the rustls HTTPS connector by calling enable_http2() in addition to enable_http1(). This enables HTTP/2 multiplexing for Kubernetes API connections, reducing the number of TCP connections from potentially hundreds to a single multiplexed connection per API server. Previously, kube-rs only supported HTTP/1.1 connections, resulting in a new TCP connection for each concurrent API request. With HTTP/2 enabled, multiple requests can be multiplexed over a single connection, improving performance and reducing resource usage. Signed-off-by: Andrew McDermott <aim@frobware.com>
|
Sorry for the silence here -- finally came back to it. The branch is rebased on The shape is two transports inside One thing that bit me along the way: hyper-rustls' openssl ALPN parity I've left for a follow-up -- openssl was already h1-in-practice (no ALPN at all), so it's not touched here. One breaking change worth calling out: Important caveat: this has had genuinely minimal exposure. It is largely a stab in the dark to see if I could get CI green. Locally I've run the integration suite, the upgrade-path examples ( |
The previous commit on this branch enabled HTTP/2 on the rustls connector by switching ALPN from `enable_http1()` alone to `enable_http1().enable_http2()`. REST traffic, watch streams and log streaming benefit from HTTP/2 multiplexing, but exec, attach and port-forward use HTTP/1.1 connection upgrades, which are unrepresentable on an HTTP/2 connection. After ALPN negotiates `h2`, the apiserver answered upgrade requests with 4xx/5xx and the integration suite started failing with `UpgradeConnection(ProtocolSwitch(...))`. Fix this by giving `Client` two transports with separate connection pools and ALPN policies: * a primary, h2-capable transport for normal traffic, configured with `TokioTimer` and HTTP/2 keep-alive PINGs (interval 30s, while-idle on) so watch streams survive idle-killing intermediaries such as HAProxy; * an HTTP/1.1-only transport used by `Client::connect()`, the only caller of `hyper::upgrade::on` in the workspace, hard-routed there via a new `upgrade_inner` field on `Client`. A subtle point governs the h1-only path on rustls. hyper-rustls' builder asserts that `alpn_protocols.is_empty()` when accepting a rustls config, and only `enable_http2()` populates the list. The `enable_http1()`-only builder path therefore leaves ALPN empty, which means no ALPN extension is sent on the wire and a modern apiserver may still negotiate HTTP/2 -- the very condition we are trying to avoid. The h1-only sibling must advertise `http/1.1` *explicitly*. The natural workaround -- `HttpsConnector::from((http, Arc::new(rustls_config)))` with `alpn_protocols = vec![b"http/1.1".to_vec()]` -- bypasses the builder's assertion but also drops `Config::tls_server_name`, because the `From` impl constructs the connector with the default server-name resolver and the resolver field is private. Rather than depend on an upstream hyper-rustls change, ship a small `H1OnlyHttpsConnector<H>` in `kube-client/src/client/tls.rs` that mirrors the public TCP-then-TLS dance from `hyper_rustls::HttpsConnector::call`, sets the explicit ALPN advertisement, and resolves SNI via `Config::tls_server_name` when set or from the URI host otherwise. This adds `tokio-rustls` as a direct dep gated on `rustls-tls` (already in the dep tree transitively via hyper-rustls). Other invariants worth preserving: * `Config::auth_layer()` and `Config::extra_headers_layer()` are computed once and *cloned* into both transport stacks. Calling `auth_layer()` twice would mint independent `RefreshableToken` state, so each transport would refresh tokens on its own and could diverge under exec-plugin or token-file auth. `AuthLayer` gains `#[derive(Clone)]` for this; the inner `Either` was already trivially cloneable. * `Config` gains a public `disable_http2: bool` field as a runtime escape hatch. When set, the primary transport falls back to the h1-only client; the two-client shape stays the same so the upgrade path is unaffected. HTTP/2 stays on by default; the field matches the existing `disable_compression` style. * `ClientBuilder` grows `with_upgrade_service`, paired with a new `Client::new_with_upgrade` constructor for custom-service users who supply their own service stack and need to opt in to the split. The single-service `Client::new` keeps its signature; it internally clones the buffered handle into `upgrade_inner` so back-compat is preserved. Verified against k3d v1.34.1 with the integration suite, the `pod_exec`, `pod_attach`, `pod_portforward*`, `pod_shell*` and `log_stream` examples, and a full 1200-combo `cargo hack` feature powerset. Signed-off-by: Andrew McDermott <aim@frobware.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1823 +/- ##
=======================================
+ Coverage 76.5% 76.5% +0.1%
=======================================
Files 89 89
Lines 8658 8772 +114
=======================================
+ Hits 6616 6704 +88
- Misses 2042 2068 +26
🚀 New features to boost your workflow:
|
|
Pointed Worth being precise about the topology. There are two HAProxies in the cluster but only one is in kat's data path: the external defaults
log global
option dontlognull
timeout connect 10s
timeout client 1h
timeout server 1h
timeout tunnel 1h
frontend fe_kubeapi
mode tcp
bind :6443
tcp-request inspect-delay 5s
tcp-request content accept if { req.ssl_hello_type 1 }
# SNI-based routing to per-cluster backendsSo this run doesn't exercise an L7 HAProxy with an aggressive It's a useful data point that the multiplexing savings hold for hours of real traffic, with no errors or stalls. For an extra poke at robustness I SIGSTOP'd the process for a while to let the kernel buffer up, then A separate thing worth discussing: the keep-alive interval is currently hard-coded to 30s in |
|
A bit of belt-and-braces while I'm here -- corroborating the apiserver's protocol behaviour with curl, independent of any kube-rs code, so the assumptions the PR rests on are visible. Two probes against the same external HAProxy LB: once with the per-cluster apiserver hostname (the path kat actually takes) and once with the LB's own hostname (to demonstrate the SNI policy in the rejection direction). Probe 1: per-cluster apiserver via the LB
These three rows mirror the two ALPN advertisements this PR makes ( Verbose for the Probe 2: the LB hostname directly (negative test)
Verbose for The HAProxy SNI map only forwards traffic for known per-cluster SNIs ( |
Signed-off-by: Eirik A <sszynrae@gmail.com>
|
Thanks a lot for this. I think it looks very reasonable at a high level view. Will have to dig in a bit more on some of these bits (as time allows).
Not a problem. A Have resolved some tiny Cargo.toml conflicts on your branch as a result of 819d08a. (feel free to over smash over it, but this stuff is squash-merged at the end anyway) |
| // transports. Calling `auth_layer()` twice would mint independent | ||
| // `RefreshableToken` state per transport, so each path would refresh | ||
| // tokens on its own and they'd diverge under exec-plugin or token-file | ||
| // auth. |
There was a problem hiding this comment.
really appreciate these detailed comments here.
helps the complexity go down a little more easily.
| let mut connector = TimeoutConnector::new(connector); | ||
| connector.set_connect_timeout(config.connect_timeout); | ||
| connector.set_read_timeout(config.read_timeout); | ||
| connector.set_write_timeout(config.write_timeout); |
There was a problem hiding this comment.
The write timeout here is default 295s which is a number very specific to long watches. I am not sure it makes sense for websocket calls. But I don't have a good recommendation either yet.
|
I started on but haven't pushed the commit yet as a) it was getting late and b) ZERO testing. |
This change enables HTTP/2 support in the client, allowing multiple requests to multiplex over a single connection. The implementation simply adds the http2 feature flag to hyper and its related dependencies, then enables HTTP/2 alongside HTTP/1 in the rustls connector configuration.
I noticed that I had 400+ open connections to the API server when streaming logs from all pods in my cluster, each requiring its own TCP handshake and TLS negotiation. With HTTP/2 enabled, I now only have a single connection handling all those streams, eliminating the overhead and avoiding potential resource exhaustion from hitting connection limits.